Skip to content

Conversation

@mshockwave
Copy link
Member

@mshockwave mshockwave commented Nov 7, 2025

Per our discussions in RISC-V roundtable during LLVM Dev Meeting 2025, there has been a consensus to create a base / generic scheduling model for the most common performance tuning usages. As the first step, we agree to use SpacemitX60's scheduling model for that purpose for the time being, with an expectation to create a standalone generic model that could evolve independently in the future.

This patch sets -mtune=generic to use SpacemitX60's scheduling model, and documents the rationale behind it, including the roadmap ahead as we discussed.


Update: the change was reflected on -mcpu=generic-rv32/64 as well.

Note that I didn't make the same change on -mcpu=generic-rv32/64, because it would invalidate most of, if not all of the codegen tests (we set TuneCPU to generic-rv32/64 when -mtune is empty). I would like to solicit opinions on whether we should do it or not.

For reference:

  • X86: -mtune=generic is equal to -mtune=sandy-bridge. But in the absent of -mtune, the -mtune is set of i586, which uses NoSchedModel. The rationale behind it is similar to what I mentioned above: setting to i586 can make sure it won't break any the codegen tests
  • AArch64: -mtune=generic is equal to -mtune=cortex-a510 -- same for the case where -mtune is empty. I guess they bit the bullet and updated all the test cases when they made that decision (i.e. to use cotext-a510 rather than NoSchedModel for codegen tests)

So I guess the question here is: should we update all affected codegen tests? On one hand it makes the tests less stable whenever we update SpacemitX60's scheduling model -- which is still relatively active. On the other hand, it's more straightforward if we maintain a consistency between generic and generic-rv32/64 -- both of them use the same scheduling model.

@llvmbot
Copy link
Member

llvmbot commented Nov 7, 2025

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-risc-v

Author: Min-Yih Hsu (mshockwave)

Changes

Per our discussions in RISC-V roundtable during LLVM Dev Meeting 2025, there has been a consensus to create a base / generic scheduling model for the most common performance tuning usages. As the first step, we agree to use SpacemitX60's scheduling model for that purpose for the time being, with an expectation to create a standalone generic model that could evolve independently in the future.

This patch sets -mtune=generic to use SpacemitX60's scheduling model, and documents the rationale behind it, including the roadmap ahead as we discussed.


Note that I didn't make the same change on -mcpu=generic-rv32/64, because it would invalidate most of, if not all of the codegen tests (we set TuneCPU to generic-rv32/64 when -mtune is empty). I would like to solicit opinions on whether we should do it or not.

For reference:

  • X86: -mtune=generic is equal to -mtune=sandy-bridge. But in the absent of -mtune, the -mtune is set of i586, which uses NoSchedModel. The rationale behind it is similar to what I mentioned above: setting to i586 can make sure it won't break any the codegen tests
  • AArch64: -mtune=generic is equal to -mtune=cortex-a510 -- same for the case where -mtune is empty. I guess they bit the bullet and updated all the test cases when they made that decision (i.e. to use cotext-a510 rather than NoSchedModel for codegen tests)

So I guess the question here is: should we update all affected codegen tests? On one hand it makes the tests less stable whenever we update SpacemitX60's scheduling model -- which is still relatively active. On the other hand, it's more straightforward if we maintain the consistency between generic and generic-rv32/64 -- both of them use the same scheduling model.


Full diff: https://github.com/llvm/llvm-project/pull/167008.diff

3 Files Affected:

  • (modified) llvm/docs/RISCVUsage.rst (+13)
  • (modified) llvm/docs/ReleaseNotes.md (+1)
  • (modified) llvm/lib/Target/RISCV/RISCVProcessors.td (+6-1)
diff --git a/llvm/docs/RISCVUsage.rst b/llvm/docs/RISCVUsage.rst
index d03f383a92b3b..e213db332cf38 100644
--- a/llvm/docs/RISCVUsage.rst
+++ b/llvm/docs/RISCVUsage.rst
@@ -601,3 +601,16 @@ Clang's ``-msmall-data-limit=`` option controls what the threshold size is (in b
 The small data limit threshold is also used to separate small constants into sections with names starting with ``.srodata``. LLD does not place these with the ``.sdata`` and ``.sbss`` sections as ``.srodata`` sections are read only and the other two are writable. Instead the ``.srodata`` sections are placed adjacent to ``.rodata``.
 
 Data suggests that these options can produce significant improvements across a range of benchmarks.
+
+Scheduling Model and Tuning
+===========================
+
+RISC-V is highly configurable, meaning its scheduling models could be highly diversified as well. Yet we still believe it is helpful to provide a "generic" tuning processor / scheduling model that represents the "lowest common denominator" RISC-V implementation at the time. The idea is that it could serve as a "good-enough" baseline model for performance tuning purposes on some of the most common use cases.
+
+Though details of this generic scheduling model might evolve over time, we always have some _expectations_ on the kind of processors it is used for.
+
+For example, the ``generic`` tuning processor is expected to target in-order, superscalar application processors designed for general-purpose computing. It is usually RVA22U64- or RVA23U64-capable intended to run Linux. The ``generic-ooo`` has a similar set of expectations, except it is targeting out-of-order application processors.
+
+Right now, we simply assign a scheduling model that is widely used by the community to ``generic``. But in the future, we can create a standalone scheduling model for ``generic``, or even create a generic model for each of the individual sectors. For example, a ``generic-embedded`` for embedded processors and a ``generic-server`` for server workloads.
+
+These future generic models could even serve as the "base" model for other scheduling models to derive from: it's not uncommon for multiple processors to share a similar set of instruction scheduling info except a few key instructions, and this is especially true for RISC-V given its highly configurable nature. If we could design the base model in a way that it can be _parameterized_ by subtarget tuning features, we can substitue the traditional way of creating individual scheduling models with a combination of base scheduling model + different subtarget features.
diff --git a/llvm/docs/ReleaseNotes.md b/llvm/docs/ReleaseNotes.md
index 23bba99ec874f..d330120214705 100644
--- a/llvm/docs/ReleaseNotes.md
+++ b/llvm/docs/ReleaseNotes.md
@@ -138,6 +138,7 @@ Changes to the RISC-V Backend
 * Adds experimental support for the 'Zibi` (Branch with Immediate) extension.
 * Add support for Zvfofp8min (OFP8 conversion extension)
 * Adds assembler support for the Andes `XAndesvsinth` (Andes Vector Small Int Handling Extension).
+* `-mtune=generic` now uses the scheduling model from SpacemitX60 instead of an empty scheduling model.
 
 Changes to the WebAssembly Backend
 ----------------------------------
diff --git a/llvm/lib/Target/RISCV/RISCVProcessors.td b/llvm/lib/Target/RISCV/RISCVProcessors.td
index e86431f78f1ba..9e31d08ae2243 100644
--- a/llvm/lib/Target/RISCV/RISCVProcessors.td
+++ b/llvm/lib/Target/RISCV/RISCVProcessors.td
@@ -102,7 +102,12 @@ def GENERIC_RV64 : RISCVProcessorModel<"generic-rv64",
                    GenericTuneInfo;
 // Support generic for compatibility with other targets. The triple will be used
 // to change to the appropriate rv32/rv64 version.
-def GENERIC : RISCVTuneProcessorModel<"generic", NoSchedModel>, GenericTuneInfo;
+// `generic` is expected to target in-order application processors designed for
+// general-purpose computing.
+def GENERIC : RISCVTuneProcessorModel<"generic", SpacemitX60Model>,
+              GenericTuneInfo;
+// `generic-ooo` is expected to target out-of-order application processors designed
+// for general-purpose computing.
 def GENERIC_OOO : RISCVTuneProcessorModel<"generic-ooo", GenericOOOModel>,
                   GenericTuneInfo;
 

def GENERIC : RISCVTuneProcessorModel<"generic", NoSchedModel>, GenericTuneInfo;
// `generic` is expected to target in-order application processors designed for
// general-purpose computing.
def GENERIC : RISCVTuneProcessorModel<"generic", SpacemitX60Model>,
Copy link
Collaborator

@topperc topperc Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't RISCVSubtarget.cpp always remap "generic" to "generic-rv32" or "generic-rv64"?

Please see description here edd6632

Copy link
Member Author

@mshockwave mshockwave Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does, but it will not remap -mtune=generic.
So after this patch, -mtune=generic effectively becomes -mtune=spacemit-x60 while -mcpu=generic/generic-rv32/generic-rv64 stay the same (i.e. noschedmodel).

Now, if we want to make -mtune=generic and -mcpu=generic having the same behavior, then we need to update nearly all the test cases, hence the question I raised in the PR description.

Copy link
Collaborator

@topperc topperc Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've been saying -mtune=generic, but I think what we actually mean was no -mtune provided. Meaning default behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no -mtune provided. Meaning default behavior.

fair enough, if that's the case I guess we have to bite the bullet and update all the codegen tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should just update the codegen tests, IMO.

Copy link
Collaborator

@topperc topperc Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok the behavior between clang and llvm is a bit messy.

clang rejects -mcpu=generic. It requires generic-rv32/rv64 for mcpu.
With no -mcpu, clang passes generic-rv32/rv64 as the CPU.
Without an explicit mtune clang only passes a cpu in the target-cpu attribute.
Clang accepts -mtune=generic, -mtune=generic-rv32, or -mtune=generic-rv64. These will be passed in the tune-cpu function attribute. These will use 3 different RISCVProcessor.td entries.

So -mtune=generic uses the "generic" entry RISCVProcessors.td, but no -mtune and -mcpu=generic or no -mcpu uses the generic-rv32/rv64 entry.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be trying to resolve some of this strangeness?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be trying to resolve some of this strangeness?

Done by #168612

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a follow-up to this: RISCVTuneProcessorModel<"generic", ...> is revert back to NoSchedModel, because any "generic" -- either -mcpu=generic or -mtune=generic -- will boil down to generic-rv32/rv64 now.

@mshockwave
Copy link
Member Author

Alright I just updated all the codegen tests (~700 of them got affected).

I noticed that test/CodeGen/RISCV/rvv/vfma-vp.ll...ran out of registers in one of its (LMUL=8) functions:

error: <unknown>:0:0: ran out of registers during register allocation in function 'vfma_vv_nxv32bf16'
error: <unknown>:0:0: ran out of registers during register allocation in function 'vfma_vv_nxv32bf16_unmasked'

This actually didn't abort the compilation but still raise an error. I marked this test as XFAIL at this moment.

@topperc
Copy link
Collaborator

topperc commented Nov 19, 2025

Alright I just updated all the codegen tests (~700 of them got affected).

I noticed that test/CodeGen/RISCV/rvv/vfma-vp.ll...ran out of registers in one of its (LMUL=8) functions:

error: <unknown>:0:0: ran out of registers during register allocation in function 'vfma_vv_nxv32bf16'
error: <unknown>:0:0: ran out of registers during register allocation in function 'vfma_vv_nxv32bf16_unmasked'

This actually didn't abort the compilation but still raise an error. I marked this test as XFAIL at this moment.

This error is because the schedule sank the copies from vector argument registers to virtual registers. This overconstrained register allocation. This is effectively the same issue as #41914 This patch fixes it by giving the COPY a hardware resource and latency:

diff --git a/llvm/lib/Target/RISCV/RISCVSchedSpacemitX60.td b/llvm/lib/Target/RISCV/RISCVSchedSpacemitX60.td
index 4271a6816e05..b6d4b1c44f00 100644
--- a/llvm/lib/Target/RISCV/RISCVSchedSpacemitX60.td
+++ b/llvm/lib/Target/RISCV/RISCVSchedSpacemitX60.td
@@ -833,6 +833,8 @@ def : WriteRes<WriteCSR, [SMX60_IEU]>;
 def : WriteRes<WriteNop, [SMX60_IEU]>;
 def : WriteRes<WriteRdVLENB, [SMX60_IEUA]>;
 
+def : InstRW<[WriteIALU], (instrs COPY)>;
+
 //===----------------------------------------------------------------------===//
 // Bypass and advance
 def : ReadAdvance<ReadJmp, 0>;

@github-actions
Copy link

🐧 Linux x64 Test Results

  • 186424 tests passed
  • 4868 tests skipped

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants